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I. Introduetion and Summary 

We have been eharged with the task of eondueting a “. . .foeused study addressing the 
fundamental question of whether the Clearinghouse's evidenee review proeess and 
reports are seientiUeally valid-that is, provide aeeurate information about the strength of 
evidenee of meaningful effeets on important edueational outeomes.” (Our eomplete 
eharge is reprodueed as Appendix A, below). 

Based on our investigation and analysis of the What Works Clearinghouse (hereafter, 
WWC), we have eoneluded that: 

(1) WWC proeedures and proeesses for identifying and extraeting information from 
intervention studies are generally well doeumented and follow reasonable standards and 
practiees for systematie reviews; 

(2) WWC Intervention and Topie Area Reports provide sueeinet and meaningful 
summaries of the evidenee on the effeetiveness of speeifie edueation interventions. 

Support for these eonelusions is detailed in the remainder of the report. We have also 
formed a number of speeifie reeommendations for the eontinued enhaneement and 
improvement of WWC proeedures, whieh are summarized in seetion IV. Primary among 
these reeommendations is that the Department of Education commission a comprehensive 
review of the full range of WWC activities and procedures, with a time frame to allow a 
complete consideration of a number of issues we have not been able to fully evaluate in 
this report. 




II. Description of the Panel’s Activities and Materials Considered 

The panel was eonvened in late July 2008, and held two telephone eonferenees meetings 
on August 25, 2008 and September 5, 2008, with representatives from the Institute of 
Edueation Scienees (lES) present at both meetings. The panel also met for a full day on 
September 11, 2008 with lES staff and members of the National Board for Education 
Sciences. During part of the meeting the project director of WWC, Dr. Mark Dynarski, 
and the deputy director. Dr. Jill Constantine, were present and answered questions from 
the panel. 

The panel was provided with a number of confidential documents describing the 
procedures of the WWC, including an August 18, 2008 draft of the “What Works 
Clearinghouse: Standards and Processes” (S&P) manual, and full documentation for the 
reviews of three interventions: the “Check and Connect” program, which was reviewed 
as a dropout prevention intervention, the “Tools of the Mind” program, which was 
reviewed as an early childhood education intervention, and the “Accelerated Reader” 
program, which was reviewed as a beginning reading intervention. Included in the 
materials were the study protocols for three topic areas, copies of the studies that passed 
the initial eligibility screenings, and the WWC Intervention Reports for these three 
interventions. These documents allowed the panel to assess the implementation of WWC 
procedures and standards for a relatively large group of studies (a total of 1 19 studies, 19 
of which passed the initial eligibility screen). The panel also benefited from an extensive 
series of written communications from lES and WWC staff, answering specific questions 
posed before, during, and after our September 1 1 meeting. 

III. The WWC Review and Reporting Process 

a. Goals of the WWC 

In view of the time constraints faced by the panel, the panel determined that it would 
conduct its evaluation assuming that the goal of the WWC review and reporting process 
is to assess and summarize the strength of the evidence regarding the effectiveness of 
replicable interventions (programs, products, practices, and policies).' The panel notes 
that a more comprehensive review could (and presumably would) evaluate the mission of 
the WWC, and in particular the focus on judging efficacy of specific interventions. 

b. Overview of the Review and Reporting Process 

Building on existing research and accepted principles in the field of research synthesis, 
the panel identified six key steps in the WWC’s review and reporting process that follow 
the delineation of a topic area for systematic review^: 

1 . formulation of inclusion criteria for studies in a topic area 

' This is consistent with the description of the WWC at http://ies.ncee/wwc/aboutus/. 

The panel did not consider or try to evaluate the choice of topic areas considered by 
WWC. 
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2. development and implementation of search criteria for potentially included 
studies 

3. implementation of initial eligibility screens 

4. review and classification of studies passing initial eligibility screen 

5. extraction and synthesis of estimated effects from studies deemed to contain 
useable evidence 

6. summary and reporting of evidence on the effectiveness of specific interventions 

For each of the six steps, the panel reviewed the standards adopted by the WWC in the 
“Standards and Processes” Manual, and assessed the application/implementation of these 
standards to the three interventions for which we had complete materials. 

c. Step 1: Formulation of Inclusion Criteria 

The protocol template describes the rules that will be used to classify a study as 
“meeting” or “not meeting” evidence screens during Stage 1 of the WWC review process. 
For each topic area, the WWC protocol template covers the following issues: 

• Topic Area Focus - Defining the outcomes that the interventions should affect, 
and the particular population subgroups of interest 

• Key Definitions - Definitions of outcomes, intervention types, key subgroups 

• Inclusion Criteria - 

a. Populations 

b. Types of interventions 

c. Types of research studies 

d. Topic relevance 

e. Timeframe relevance 
f Sample relevance 

g. Study design relevance 

h. Outcome relevance 

• Specific Topic Parameters 

a. Characteristics of interventions 

b. Elements of intervention replicability 

c. Outcomes relevant to the topic area 

d. Reliability of outcome measures 

e. Timeframe of review 

f Defining characteristics of the target population 

g. Characteristics relevant to equating groups 

h. Effectiveness of the intervention across different groups 

i. Effectiveness of the intervention across different settings 

j. Measuring post- intervention effects 

k. Defining differential attrition 

l. Defining severe overall attrition 

m. Statistical properties important for computing effect sizes. 

These issues can be grouped into five main criteria: 

• Population(s) of interest 
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• Types of interventions 

• Time period eovered 

• Types of outcomes 

• Standards of evidence 

With respect to the definition of the population of interest and the types of interventions 
to be considered in a topic area, the panel noted that WWC procedures rely on a 
combination of nominations from the field and discussions with Department of Education 
staff Some topic areas (such as dropout prevention) necessarily involve a narrower 
population, whereas others (e.g., early reading) can refer to a broader population or to a 
specific subgroup. Since the choice of the target population is primarily a question of 
resource allocation and not scientific appropriateness, the panel did review this issue 
further. The panel infers from existing documents and information received from lES 
and WWC staff that WWC focuses on interventions that involve a well-defined set of 
activities that can be replicated in other settings (for example, “branded” interventions 
sold by publishing companies). The panel agrees with this focus, since systematic 
reviews of existing research are most likely to be informative when the intervention 
meets these criteria, and these types of interventions are likely to be of wide interest to 
the education community. 

With respect to the time period covered, the protocols have generally limited the scope to 
studies conducted in the past 20 years (with some categories such as conference 
proceedings limited to the most recent seven years). The panel believes this limitation is 
appropriate. 

With respect to outcomes of interest and standards of evidence, there are specific 
standards defined in each protocol for the following: 

• Admissible research designs 

• Reliability of outcome measures 

• Characteristics relevant to equating groups 

• Timing of measurement of post-intervention effects 

• Defining differential attrition 

• Defining severe overall attrition 

• Statistical properties important for computing effect sizes 

In general, the panel believes that the specification of minimum standards for reliability 
of outcomes, timing of measurement, attrition, and the information needed to construct 
effect sizes is appropriate for a systematic review. 

The types of research designs considered within scope for WWC include randomized 
controlled trials (RCTs) and longitudinal quasi-experimental designs (QEDs) with pre- 
intervention equating. Eor the latter designs, the protocol specifies minimum standards 
for characteristics relevant to equating the treatment and comparison groups prior to the 
treatment intervention. The panel agrees with the use of such standards. In principle, 
regression discontinuity-based quasi-experimental designs are also considered in scope, 
but minimum standards for these designs have not yet been developed, and the panel did 
not consider the standards for these designs. 
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Three other issues potentially relevant for speeifying standards of evidence are not 
explicitly covered by the protocols: 

• Standards for non-compliance with assignment status 

• Specification of intensity of treatment 

• Specification of the control state 

With respect to the first of these issues, the panel notes that non-compliance with 
assignment status (also known as “crossover”) can lead to difficulties in interpreting 
intention-to-treat effects and in making comparisons across studies. Current WWC 
procedures leave the Principal Investigator(s) with discretion to cope with non- 
compliance but do not require a minimum standard or specify an adjustment process. 

With respect to the second issue, the panel notes that comparisons across studies in which 
the intensity of treatment is varied (e.g., one year of exposure to treatment versus two, or 
an original version of a curriculum versus a revised or enhanced version) can lead to 
difficulties in making comparisons across studies. Similarly, the panel notes that the 
precise conditions for members of the control group (in a RCT) or comparison group (in 
a QED) can vary across studies even when the treatment is held constant, potentially 
leading to difficulties in interpreting differences in estimated effects across studies. 

Finally, the panel notes that potential issues can arise when a study uses an outcome 
measure that directly tests for the content of the program itself In this case the outcome 
measure is said to be “over-aligned” or to be “treatment inherent.” The topic area 
protocol for beginning reading specifies that RCTs with an “over- alignment problem” 
will be downgraded, while QEDs will fail the review standard. The protocol for early 
childhood programs does not appear to mention the issue of over-alignment. The panel 
recognizes the potential biases that can arise from over-alignment but did not have 
sufficient time to evaluate the impact of this issue on the WWC review process. 

d. Development and Implementation of Search Criteria 

Once a topic area is identified and a protocol is established, the WWC follows an 
iterative search process with two broad phases: (1) a broad search of the literature, based 
on key words specified in the protocol, to identify potential interventions; and (2) a more 
focused search for studies of the interventions identified in phase 1 . The search 
parameters (keywords) for the broad search are identified in each topic area protocol. 

The process includes standard databases, a prescribed list of journals, conference 
programs, the websites of developers, publishers, and various research organizations, and 
direct queries to researchers and developers. WWC also receives direct submissions 
from the public (including researchers and product developers) that supplement other 
sources of unpublished studies. 

The panel believes that the conceptual framework for searching used by the WWC is 
sound, and that the use of trained librarians in combination with topic area team members 



^ This issue is emphasized by Slavin and Madden (2008). 
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is in accord with accepted standards. However, doeumentation of the seareh proeedures 
aetually implemented in the speeifie topie areas, and information on the results of this 
proeess (sueh as “yield rates” from various sources) is limited, preventing the panel from 
drawing stronger eonelusions. The panel understands that WWC is in the proeess of 
revising the protoeol template to standardize the reporting of seareh proeedures. 

e. Implementation of Initial Eligibility Screens 

After the seareh for potentially eligible studies, WWC staff eonduct an initial screening 
based on the inelusion eriteria set out in the topie area protoeol. As noted above, these 
standards ean be grouped into five main areas: 

• Population(s) of interest 

• Type of interventions 

• Time period eovered 

• Types of outeomes 

• Standards of evidence 

To pass the initial sereening stage, studies must pertain to the speeified population of 
interest, during the time period eovered, and must address an appropriate type of 
intervention. Eligible studies must use an eligible researeh design (in praetiee, either 
RCT or longitudinal QED with pre-intervention equating) with at least one “adequate 
outeome measure.” The latter is defined as an instrument that has demonstrated evidenee 
of reliability in a national probability sample. Studies must provide “adequate outeome 
reporting.” This is interpreted as requiring that the study report means and standard 
deviations for the key outcome measures. 

Studies that do not pass the initial sereening are elassified as ''Does Not Meet Evidence 
Screens ” and exeluded from further review. Studies that pass the initial sereens move to 
the next stage (Stage 2), in whieh reviewers determine whether the studies meet WWC 
“evidenee eriteria.” 

Initial sereening is performed by a single reviewer who reads the title and abstract of a 
written report, reeords information on a Study Review Guide, and determines whether or 
not the study passes this stage. If there is insuffieient information in the title and abstraet, 
the initial sereen is based on the full-text of a study report. The panel notes that the Study 
Review Guides appear to be linked to the topie area protoeols and are not eonsistent 
across topic areas. The panel was unable to assess the completeness of the Study Review 
Guides for studies of the three interventions for whieh we had eomplete materials, or to 
verify how aeeurately these Guides are fdled out."^ 

The panel believes that the general WWC approaeh to initial eligibility sereening follows 
aceepted praetiee in the field of systematie review, by determining whether eaeh of the 



The materials provided for the Check and Connect review provided full-text reports for 
five of the six studies that were sereened, and eompleted Study Review Guides for all six 
studies. In contrast, the materials for the Aeeelerated Reader review report eligibility 
deeisions, but do not include eompleted Study Review Guides for all studies. 
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studies identified in the seareh proeess meets predetermined inelusion eriteria. It appears 
that WWC sereeners eorreetly applied standards for population, type of intervention, and 
date of study in the eases we were able to review. Reliability standards were not used in 
the review of dropout prevention studies beeause psychometric outcomes were not used. 
Common reliability standards (internal consistency 0.6; temporal stability 0.4; inter-rater 
reliability 0.5) were used for interventions in the beginning reading topic area and the 
early childhood topic area. It also appears that WWC sereeners adhere to the generally 
accepted principle of retaining studies if there is any doubt about their eligibility (erring 
on the side of over-inclusion) until the last step of the screening process. 

Nevertheless, the panel notes two specific concerns with the WWC initial eligibility 
screening process. First, the protocols for some topic areas (including the Dropout 
Prevention topic area) would lead reviewers to eliminate otherwise-eligible studies 
because those studies did not report group means and standard deviations. This may not 
be necessary if effect sizes can be calculated from other information. Second, the use of 
a single screener at the initial eligibility stage may lead to “over-rejection” of potentially 
eligible studies.^ Recent studies have concluded that “it is desirable for more than one 
[reviewer] to repeat parts of the [screening] process” (Higgins & Green 2008) to check 
the reliability of the screening process 

/ Review and Classification of Studies Passing Initial Eligibility Screen 

Studies that have passed the WWC initial eligibility screening move to Stage 2 of the 
WWC review process. In this stage each study is reviewed independently by two 
reviewers using a Study Review Guide. The two reviews are combined by a senior 
reviewer, sometimes using additional information obtained from direct queries to the 
author of a study. The primary goal of the Stage 2 process is to classify eligible studies 
into three mutually exclusive categories: “Meets Evidence Standards,” “Meets Evidence 
Standards with Reservations,” or “Does not Meet Evidence Standards.” The use of two 
independent reviewers is consistent with accepted scientific standards for the conduct of 
high quality systematic reviews. 

The standards for review and classification of studies involve a number of features which 
are specified differently for randomized controlled trials (RCTs) and quasi-experimental 
designs (QEDs). Eor studies that appear to be RCTs the factors are: 

• Randomization 

• Overall attrition 

• Differential attrition 

• Intervention contamination 

• Teacher-intervention confound 

Eor each factor, a primary standard is established in the Study Review Guide for a study 
to be classified as “meeting evidence standards.” A secondary standard is also 
established such that if the study falls short of the primary standard, but meets or exceeds 



^ Existing research shows that single sereeners can miss up to 24 percent of eligible 
studies (on average, sereeners missed 8 percent of eligible studies; Edwards et ah, 2002). 
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the secondary, then the study is “downgraded.” A study that is downgraded on one factor 
is classified as “meeting evidence standards with reservations.” A study that is 
downgraded on two or more factors is classified as “not meeting evidence standards.” 

The primary standard for randomization was adjusted as of January 1, 2007, to require 
that the study provide specific information on the assignment process to establish that the 
assignment process was random or functionally random. Prior to that date the standard 
required only that the author claim “random assignment.” The panel agrees that the 
current standard is appropriate, although it notes that the precise definition of random 
assignment in a classroom setting, where students and teachers are both assigned to 
treatment or control arms, should be spelled out in detail. The panel also agrees that the 
inclusion of studies with functionally random assignment is justified, provided that 
“functionally random assignment” applies to teacher/classroom assignments to treatment 
and control arms, and that the study demonstrates equivalence using informative pre- 
assignment characteristics of the students and teachers assigned to the two groups. 

The primary standards for overall and differential attrition were set at different thresholds 
for different topic areas. No standards were defined for beginning reading; the thresholds 
for early childhood interventions were 20 percent overall, 40 percent within cluster, and 7 
percent differential; the thresholds for dropout prevention intervention studies were 30 
percent overall and 5 percent differential. The panel notes that there is no standard 
threshold of attrition in the literature but also believes that there is little scientific basis 
for relaxing the standard of evidence in different topic areas. 

WWC policy on intervention contamination is that for an RCT to meet evidence 
standards “. . .there should be no evidence of a changed expectancy/novelty/disruption, a 
local history event, or any other intervention contaminants” (Standards and Processes, 
Appendix B). The panel notes that as a practical matter full information on whether such 
disruptions occurred is unlikely to be available, but should be taken into account when 
available. 

Standards for teacher- intervention confound are: (1) if there is only one teacher per 
condition and there is no evidence that teacher effects are negligible the study does not 
meet evidence standards; (2) if there is only one teacher per condition and the study 
supplies evidence that teacher effects are “minimal” the study meets evidence standards 
with reservations; and (3) if there is more than one teacher per condition, or one teacher 
per condition with strong evidence that teacher effects are negligible the study meets 
evidence standards. The panel agrees with the basic principle of down-weighting 
reported evidence from studies with potential teacher confounding, although it also notes 
that the “strikes against” downgrading process is inherently arbitrary. In the opinion of 
the panel, case (2) - with only one teacher per condition - is inherently a weak design 
and arguably fails to meet the standards of evidence. 

The protocols and Study Review Guides also specify specific adjustments to be made to 
estimates of statistical significance that correct for mismatch between unit of assignment 
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and unit of analysis (e.g., classes are random assignment but the analysis is eonducted on 
student data without elustering by elass). These are diseussed further in the next seetion. 

For longitudinal QEDs the factors are: 

• Equating and baseline equivalenee 

• Overall attrition 

• Differential attrition 

• Intervention eontamination 

• Teaeher-intervention confound 

• Mismatch between unit of assignment and unit of analysis 

Unlike RCTs, the highest elassification that ean be aehieved by a QED is “meeting 
evidence standards with reservations.” Studies that fail to meet the (primary) standard for 
any of the faetors are elassified as “not meeting evidenee standards.” The standards for 
attrition, intervention eonfound and teaeher eonfound are essentially the same as the 
standards for RCTs. The primary differenee in standards is that for QEDs the study must 
establish equivalenee of the treatment and eomparison groups using a pre-test or proxy of 
a pre-test (and in some eases other eharaeteristies, as speeified in the topie area protoeol). 

The panel agrees with the general prineiple that well implemented RCTs represent the 
strongest form of evidenee that is available on the effeetiveness of education 
interventions. We also agree that useful information likely ean be gleaned from RCTs 
with relatively minor flaws in design or implementation, whereas RCTs with substantial 
flaws are less likely to provide sueh information. The panel also agrees that well- 
implemented quasi-experimental studies that eompare post-intervention outcomes for a 
treated group and a eomparison group that are elosely equated on a pre-test (or other 
elose proxy of the main outeome) provide potentially useful evidenee on the effeetiveness 
of edueation interventions, albeit of lower strength than from the best RCTs. Thus, while 
we believe the WWC’s “two strikes” standard for RCTs and “one strike” standard for 
QEDs is arbitrary, we eonelude that the WWC grading system allows meaningful 
information to be extraeted and presented to users of the WWC. 

g. Extraction and Synthesis of Estimated Ejfects 

The methods and proeedures for the extraction and synthesis of results from eligible 
studies are speeified in a doeument entitled "Teehnieal Details of WWC-Condueted 
Computations" (http://ies. ed.gov/noee/wwo/referenoes/iDooViewer/Doe. aspx?doeId=9). 

A link to this doeument is usually found in the respeotive intervention reports. The 
oomputations desoribed in this doeument are oonsistent with the standard presoriptions 
for effeot size estimation for oontinuous and diohotomous outcomes, respeotively. In the 
ease studies reviewed by the panel, the applioation of these proeedures for oonverting 
study outoomes to effeot sizes was appropriate. However, the panel was unable to 
determine how these proeedures were used or modified under nonstandard oonditions 
(e.g., when distributional assumptions were not satisfied). 

WWC speoifies an adjustment to be made to estimates of statistioal signifioanoe of 
estimated effeots that oorreot for mismatoh between unit of assignment and unit of 
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analysis (i.e., when classes are randomly assigned to treatment but the original analysis 
was conducted on individual level student data without clustering by class). This 
approach is based on a procedure suggested by Hedges (2007). This is only an 
approximate adjustment, and does not take into account study-specific factors that could 
lead to a larger or smaller adjustment than the one prescribed by Hedges’ approach. For 
this reason the panel believes that a study-specific correction, implemented by the study 
authors using the actual study data, would be preferable. 

WWC protocols specify the use of Benjamini-Hochberg corrections when multiple 
comparisons are reported in a study. These procedures appear to have been implemented 
in summarizing results for several beginning reading and early childhood interventions. 
The panel notes that other correction procedures are potentially preferable in certain 
situations. 

Procedures for combining effect sizes across studies (i.e., methods for synthesizing 
results) are not specified, perhaps because very few interventions have more than one or 
two studies to combine. The panel notes that in the future WWC will have to develop 
standards for combining evidence from multiple studies, including the issues of how to 
compare effect sizes across studies that vary the intensity of an intervention, and across 
studies with different control conditions. 



h. Reporting of Evidence on the Effectiveness of Interventions 

WWC reports its overall findings in a highly transparent and timely manner, after outside 
experts have conducted a review. These results are reported on the WWC website and 
are readily available for policymakers and practitioners. The presentation is intended to 
be clear to all audiences, calibrated against standards that have been set by WWC, and 
scientifically sound. 

The panel believes that for the most part, WWC achieves these objectives in its reviews. 
All systematic reviews have to make important decisions regarding what quality studies 
they pay attention to, what degrees of missing data and the like that are required, which 
method is to be used to measure effect sizes, and what cut-offs are required to achieve the 
highest level of impact. The panel believes that WWC has done a reasonable job of 
making many of these critical decisions, and a very good job of applying these standards 
once they have been set. 

i. Overall Evaluation of the WWC Process 

Overall, the panel believes that the WWC review and processes are based on 
scientifically appropriate methodologies for the task of judging the strength of the 
evidence regarding the effectiveness of the interventions identified in the topic areas, 
although the panel did not have time or resources to fully investigate the application of 
these methodologies in every review. Moreover, the panel believes that the Intervention 
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and Topic Area Reports provide sueeinet and meaningful summaries of the evidenee on 
effeetiveness of speeifie interventions. 

IV. Reeommendations 



1 . Full Review. The panel reeommends that IBS eommission a full review of the What 
Works Clearinghouse, including a review of the Clearinghouse’s mission, and of the 
WWC Praetice Guides, which we have not attempted to evaluate. The panel also 
reeommends that IBS eonsider instituting a regular review proeess to ensure that WWC is 
using the most appropriate standards in its work. 

2. Protocol Templates. The panel reeommends that the WWC review and update the 
protoeol templates, foeusing on the following issues: 

(i) standards for erossover and assignment noneomplianee, and for adjusting intention to 
treat effeets aeross studies. 

(ii) standards for doeumenting the program reeeived in the eontrol arm of RCTs (or by 
members of the eomparison group in QBDs), and potentially ineorporating this 
information in making eomparisons aeross studies and/or interventions. 

(iii) revised standards for multiple eomparisons. We reeommend that WWC review the 
treatment of multiple eomparisons in light of the reeent researeh report by Peter Sehoehet 
entitled “Guidelines for Multiple Testing in Impact Bvaluations.” 

(iv) attrition standards. We reeommend that WWC reeonsider the eurrent proeess of 
setting different attrition standards in different topie areas. 

(v) potential confliets of interest. We reeommend that WWC establish a new protoeol to 
keep traek of potential eonfliets of interest, sueh as eases where a study is funded or 
eonducted by a program developer, and eonsider making that information available in its 
reports. 

(vi) randomization. We reeommend that the WWC preeisely define the standards for 
“randomization” in a multi-level setting. 

3. Documentation of Search Process. The panel reeommends that the WWC expand the 
protoeol templates to speeify more explieit doeumentation of the aetual seareh proeess 
used in eaeh topic area, and maintain a reeord of the results of the seareh proeess that ean 
be used to guide deeision making on future modifieations of the seareh proeess. 

4. Reliability of Eligibility Screening. The panel reeommends that the WWC eonduet 
regular studies of the reliability of the eligibility sereening proeess, using two 
independent sereeners, and use the results from these studies to refine the eligibility 
sereening rules and sereening praetiees. 
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5. Documentation of Screening Process. The panel reeommends that WWC reports 
include a QUOROM-type flow chart documenting the flow of studies through each 
review and number of studies excluded at each point, and a Table of Excluded Studies, 
listing specific reasons for exclusion for each study. 

6. Misalignment Adjustment. The panel recommends that in cases where a study analysis 
is "misaligned," WWC staff request that study authors re-analyze their data correctly, 
taking into account the unit of randomization and clustering. We recommend that the 
results from the process be compared to the simple ex post adjustment procedure 
currently specified, to develop evidence on the validity of the latter. 

7. Combining Evidence Across Multiple Studies. We recommend that WWC re-evaluate 
procedures for combining evidence across studies, with specific attention to the issue of 
how the rules for combining evidence can be optimally tuned, given the objectives of the 
WWC review process and the sample sizes in typical studies for a topic area. 

8. Reporting. 

(i) The panel recommends that published reports on the website include the topic area 
protocols, as well as more information on the screening process results that led to the set 
of eligible studies actually summarized in the Topic Area reports. 

(ii) The panel recommends that WWC make readily available its “Standards and 
Procedures” manual, including appendices, as well as all other relevant documents that 
establish and document its policies and procedures. 

9. Practice Guides. The panel recommends that the Practice Guides - which contain 
material that does not meet the high standards of evidence for other WWC products - be 
clearly separated from the Topic and Intervention Reports. 

10. Outreach and Collaboration with Other Organizations. The panel recommends that 
the WWC build and maintain a relationship with national and international organizations 
focusing on systematic reviews, specifically with the goals of having Review Team 
leaders engaged in the broader scientific community, and in bringing the latest standards 
and practices to the WWC. The panel also recommends that the WWC convene working 
groups with a mixture of researchers (including specialists in education research and 
systematic reviews) to address the development of new standards for the review and 
synthesis of studies. 



12 




Appendix A: Panel Charge 



DEPARTMENTS OE EABOR, HEAETH AND HUMAN SERVICES, AND 
EDUCATION, AND RELATED AGENCIES APPROPRIATION BILL, 2009 
REPORT OP THE COMMITTEE ON APPROPRIATIONS 
U.S. SENATE, 

ON S. 3230 
JULY 8, 2008. 

The Committee requests that the National Board for Edueation Seienees, as the body 
responsible for oversight of the Institute of Edueation Seienees, eonvene a blue-ribbon 
panel of leading experts in rigorous, partieularly randomized, evaluations to assess the 
What Works Clearinghouse. While the Committee believes a eomprehensive assessment 
should be undertaken given the signifieant investment made in the Clearinghouse, an 
immediate priority should be a foeused study addressing the fundamental question of 
whether the Clearinghouse's evidenee review proeess and reports are scientifieally valid- 
that is, provide aeeurate information about the strength of evidenee of meaningful effeets 
on important edueational outeomes. The Committee requests that the Board eonvene the 
panel within 60 days of enaetment of this aet, and that the panel eomplete its work and 
submit a report, ineluding any reeommendations for improvements in the Clearinghouse, 
to the Board, the Director, and Congress no later than 4 months thereafter. The 
Committee intends for panel members to be free of conflicts of interest. 
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